74 research outputs found
Concept-Based Visual Analysis of Dynamic Textual Data
Analyzing how interrelated ideas flow within and between multiple social
groups helps understand the propagation of information, ideas, and thoughts on
social media. The existing dynamic text analysis work on idea flow analysis is
mostly based on the topic model. Therefore, when analyzing the reasons behind
the flow of ideas, people have to check the textual data of the ideas, which is
annoying because of the huge amount and complex structures of these texts. To
solve this problem, we propose a concept-based dynamic visual text analytics
method, which illustrates how the content of the ideas change and helps users
analyze the root cause of the idea flow. We use concepts to summarize the
content of the ideas and show the flow of concepts with the flow lines. To
ensure the stability of the flow lines, a constrained t-SNE projection
algorithm is used to display the change of concepts over time and the
correlation between them. In order to better convey the anomalous change of the
concepts, we propose a method to detect the time periods with anomalous change
of concepts based on anomaly detection and highlight them. A qualitative
evaluation and a case study on real-world Twitter datasets demonstrate the
correctness and effectiveness of our visual analytics method.Comment: in Chinese languag
From Capture to Display: A Survey on Volumetric Video
Volumetric video, which offers immersive viewing experiences, is gaining
increasing prominence. With its six degrees of freedom, it provides viewers
with greater immersion and interactivity compared to traditional videos.
Despite their potential, volumetric video services poses significant
challenges. This survey conducts a comprehensive review of the existing
literature on volumetric video. We firstly provide a general framework of
volumetric video services, followed by a discussion on prerequisites for
volumetric video, encompassing representations, open datasets, and quality
assessment metrics. Then we delve into the current methodologies for each stage
of the volumetric video service pipeline, detailing capturing, compression,
transmission, rendering, and display techniques. Lastly, we explore various
applications enabled by this pioneering technology and we present an array of
research challenges and opportunities in the domain of volumetric video
services. This survey aspires to provide a holistic understanding of this
burgeoning field and shed light on potential future research trajectories,
aiming to bring the vision of volumetric video to fruition.Comment: Submitte
Deep Learning for Edge Computing Applications: A State-of-the-Art Survey
With the booming development of Internet-of-Things (IoT) and communication technologies such as 5G, our future world is envisioned as an interconnected entity where billions of devices will provide uninterrupted service to our daily lives and the industry. Meanwhile, these devices will generate massive amounts of valuable data at the network edge, calling for not only instant data processing but also intelligent data analysis in order to fully unleash the potential of the edge big data. Both the traditional cloud computing and on-device computing cannot sufficiently address this problem due to the high latency and the limited computation capacity, respectively. Fortunately, the emerging edge computing sheds a light on the issue by pushing the data processing from the remote network core to the local network edge, remarkably reducing the latency and improving the efficiency. Besides, the recent breakthroughs in deep learning have greatly facilitated the data processing capacity, enabling a thrilling development of novel applications, such as video surveillance and autonomous driving. The convergence of edge computing and deep learning is believed to bring new possibilities to both interdisciplinary researches and industrial applications. In this article, we provide a comprehensive survey of the latest efforts on the deep-learning-enabled edge computing applications and particularly offer insights on how to leverage the deep learning advances to facilitate edge applications from four domains, i.e., smart multimedia, smart transportation, smart city, and smart industry. We also highlight the key research challenges and promising research directions therein. We believe this survey will inspire more researches and contributions in this promising field
Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning with Hierarchical Aggregation
Multimodal learning has seen great success mining data features from multiple
modalities with remarkable model performance improvement. Meanwhile, federated
learning (FL) addresses the data sharing problem, enabling privacy-preserved
collaborative training to provide sufficient precious data. Great potential,
therefore, arises with the confluence of them, known as multimodal federated
learning. However, limitation lies in the predominant approaches as they often
assume that each local dataset records samples from all modalities. In this
paper, we aim to bridge this gap by proposing an Unimodal Training - Multimodal
Prediction (UTMP) framework under the context of multimodal federated learning.
We design HA-Fedformer, a novel transformer-based model that empowers unimodal
training with only a unimodal dataset at the client and multimodal testing by
aggregating multiple clients' knowledge for better accuracy. The key advantages
are twofold. Firstly, to alleviate the impact of data non-IID, we develop an
uncertainty-aware aggregation method for the local encoders with layer-wise
Markov Chain Monte Carlo sampling. Secondly, to overcome the challenge of
unaligned language sequence, we implement a cross-modal decoder aggregation to
capture the hidden signal correlation between decoders trained by data from
different modalities. Our experiments on popular sentiment analysis benchmarks,
CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms
state-of-the-art multimodal models under the UTMP federated learning
frameworks, with 15%-20% improvement on most attributes.Comment: 10 pages,5 figure
Understanding User Behavior in Volumetric Video Watching: Dataset, Analysis and Prediction
Volumetric video emerges as a new attractive video paradigm in recent years
since it provides an immersive and interactive 3D viewing experience with six
degree-of-freedom (DoF). Unlike traditional 2D or panoramic videos, volumetric
videos require dense point clouds, voxels, meshes, or huge neural models to
depict volumetric scenes, which results in a prohibitively high bandwidth
burden for video delivery. Users' behavior analysis, especially the viewport
and gaze analysis, then plays a significant role in prioritizing the content
streaming within users' viewport and degrading the remaining content to
maximize user QoE with limited bandwidth. Although understanding user behavior
is crucial, to the best of our best knowledge, there are no available 3D
volumetric video viewing datasets containing fine-grained user interactivity
features, not to mention further analysis and behavior prediction. In this
paper, we for the first time release a volumetric video viewing behavior
dataset, with a large scale, multiple dimensions, and diverse conditions. We
conduct an in-depth analysis to understand user behaviors when viewing
volumetric videos. Interesting findings on user viewport, gaze, and motion
preference related to different videos and users are revealed. We finally
design a transformer-based viewport prediction model that fuses the features of
both gaze and motion, which is able to achieve high accuracy at various
conditions. Our prediction model is expected to further benefit volumetric
video streaming optimization. Our dataset, along with the corresponding
visualization tools is accessible at
https://cuhksz-inml.github.io/user-behavior-in-vv-watching/Comment: Accepted by ACM MM'2
LiveVV: Human-Centered Live Volumetric Video Streaming System
Volumetric video has emerged as a prominent medium within the realm of
eXtended Reality (XR) with the advancements in computer graphics and depth
capture hardware. Users can fully immersive themselves in volumetric video with
the ability to switch their viewport in six degree-of-freedom (DOF), including
three rotational dimensions (yaw, pitch, roll) and three translational
dimensions (X, Y, Z). Different from traditional 2D videos that are composed of
pixel matrices, volumetric videos employ point clouds, meshes, or voxels to
represent a volumetric scene, resulting in significantly larger data sizes.
While previous works have successfully achieved volumetric video streaming in
video-on-demand scenarios, the live streaming of volumetric video remains an
unresolved challenge due to the limited network bandwidth and stringent latency
constraints. In this paper, we for the first time propose a holistic live
volumetric video streaming system, LiveVV, which achieves multi-view capture,
scene segmentation \& reuse, adaptive transmission, and rendering. LiveVV
contains multiple lightweight volumetric video capture modules that are capable
of being deployed without prior preparation. To reduce bandwidth consumption,
LiveVV processes static and dynamic volumetric content separately by reusing
static data with low disparity and decimating data with low visual saliency.
Besides, to deal with network fluctuation, LiveVV integrates a volumetric video
adaptive bitrate streaming algorithm (VABR) to enable fluent playback with the
maximum quality of experience. Extensive real-world experiment shows that
LiveVV can achieve live volumetric video streaming at a frame rate of 24 fps
with a latency of less than 350ms
RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN
The second-order training methods can converge much faster than first-order
optimizers in DNN training. This is because the second-order training utilizes
the inversion of the second-order information (SOI) matrix to find a more
accurate descent direction and step size. However, the huge SOI matrices bring
significant computational and memory overheads in the traditional architectures
like GPU and CPU. On the other side, the ReRAM-based process-in-memory (PIM)
technology is suitable for the second-order training because of the following
three reasons: First, PIM's computation happens in memory, which reduces data
movement overheads; Second, ReRAM crossbars can compute SOI's inversion in
time; Third, if architected properly, ReRAM crossbars can
perform matrix inversion and vector-matrix multiplications which are important
to the second-order training algorithms.
Nevertheless, current ReRAM-based PIM techniques still face a key challenge
for accelerating the second-order training. The existing ReRAM-based matrix
inversion circuitry can only support 8-bit accuracy matrix inversion and the
computational precision is not sufficient for the second-order training that
needs at least 16-bit accurate matrix inversion. In this work, we propose a
method to achieve high-precision matrix inversion based on a proven 8-bit
matrix inversion (INV) circuitry and vector-matrix multiplication (VMM)
circuitry. We design \archname{}, a ReRAM-based PIM accelerator architecture
for the second-order training. Moreover, we propose a software mapping scheme
for \archname{} to further optimize the performance by fusing VMM and INV
crossbar. Experiment shows that \archname{} can achieve an average of
115.8/11.4 speedup and 41.9/12.8energy saving
compared to a GPU counterpart and PipeLayer on large-scale DNNs.Comment: 13pages, 13 figure
Exploring the Applicability and Scaling Effects of Satellite-Observed Spring and Autumn Phenology in Complex Terrain Regions Using Four Different Spatial Resolution Products
The information on land surface phenology (LSP) was extracted from remote sensing data in many studies. However, few studies have evaluated the impacts of satellite products with different spatial resolutions on LSP extraction over regions with a heterogeneous topography. To bridge this knowledge gap, this study took the Loess Plateau as an example region and employed four types of satellite data with different spatial resolutions (250, 500, and 1000 m MODIS NDVI during the period 2001–2020 and ~10 km GIMMS3g during the period 1982–2015) to investigate the LSP changes that took place. We used the correlation coefficient (r) and root mean square error (RMSE) to evaluate the performances of various satellite products and further analyzed the applicability of the four satellite products. Our results showed that the MODIS-based start of the growing season (SOS) and end of the growing season (EOS) were highly correlated with the ground-observed data with r values of 0.82 and 0.79, respectively (p r p > 0.05). Spatially, the LSP that was derived from the MODIS products produced more reasonable spatial distributions. The inter-annual averaged MODIS SOS and EOS presented overall advanced and delayed trends during the period 2001–2020, respectively. More than two-thirds of the SOS advances and EOS delays occurred in grasslands, which determined the overall phenological changes across the entire Loess Plateau. However, both inter-annual trends of SOS and EOS derived from the GIMMS3g data were opposite to those seen in the MODIS results. There were no significant differences among the three MODIS datasets (250, 500, and 1000 m) with regard to a bias lower than 2 days, RMSE lower than 1 day, and correlation coefficient greater than 0.95 (p < 0.01). Furthermore, it was found that the phenology that was derived from the data with a 1000 m spatial resolution in the heterogeneous topography regions was feasible. Yet, in forest ecosystems and areas with an accumulated temperature ≥10 °C, the differences in phenological phase between the MODIS products could be amplified
- …